Concentration Inequalities and Confidence Bands for Needlet 
Density Estimators on Compact Homogeneous Manifolds 



Gerard Kerkyacharian, Richard Nickl and Dominique Picard 

LPMA, University of Cambridge and Universite Paris Diderot 
This Version: February 2011, First Version: November 2010 



Abstract 

Let Xi,...,X n be a random sample from some unknown probability density / de- 
fined on a compact homogeneous manifold M of dimension d > 1. Consider a 'needlet 
frame' {4>jr)} describing a localised projection onto the space of eigenfunctions of the 
Laplace operator on M wi th corresponding eigenvalues less than 2 2j , as constructed in 
Geller and Pesensonl [2010j |. We prove non-asymptotic concentration inequalities for the 
uniform deviations of the linear needlet density estimator f n (J) obtained from an empirical 
estimate of the needlet projection J2 n fan I f^ov °f /• We apply these results to construct 
risk-adaptive estimators and nonasymptotic confidence bands for the unknown density /. 
The confidence bands are adaptive over classes of diffcrcntiable and Holder-continuous 
functions on M that attain their Holder exponents. 

MSC 2000: 62G07, 60E15, 42C40 



1 Introduction 

We consider the problem of constructing confidence bands for an unknown probability density 
/ based on a sample X\, ...,X n from / observed on the <i-dimensional compact homogeneous 
manifold M. The classical statistical applications occur when M equals the (i-dimensional 
unit sphere §> d of R d+1 : If d = 1 this corresponds to estimating a periodic univariate den- 
sity and recent interest lies mostly in the ca se d = 2, strongly motivated by statistical 
problems in astrophysics, see Baldi et al. |2009 ] for an account of typic al problems and a p- 



plications in astrophysics and directional statistics more g enerally. I n Baldi et al 



due to Narcowich et al 




recent construction of wavelet type bases on § d 
called these new basis functions needlets - was employed to construct risk-adaptive estima- 
tors for f(x), x £ § d , by a local needlet se ries with support concentrated in a neighborhood 
of x. See also iKerkvacharian et al.l 20111 ] for similar results in the spherical deconvolution 
problem. The main advantages of this approach are that they share none of the drawbacks 
of classical approaches: kernel methods do not take the manifold structure of the sphere well 
into account, orthogonal series methods associated with spherical harmonics have very poor 
pointwise (and even worse uniform) performance since spherical harmonics are not well lo- 
calized but spread out all over the sphere, and methods based on stereographic projections 
of the sphere onto the plane use a distorted approximation-theoretic paradigm. In contrast 
needlets are a tight frame constructed on the spherical harmonics which are highly localized 
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and allow for optimal approximation not only in I? but in general L p -spaces, including in 
particular L°° , which is particularly relevant in the problem of constructing confidence bands 
for /. Moreover the localization property is of crucial importance since in astrophysical data 
sets some parts of the sphere (sky) may not be covered by the observations, so that non-local 
procedures may suffer severely from missing data points. 

The main contributions of the present article are three-fold. First, building on recen t 
results on wavelets and app roximation of functions on manifolds in iGeller and Mavelil [200^ . 
Geller and Pesenson 2O10i ] . we show how needlet estimators f n {j,y),y £ M, with resolution 



level j > 0, can be defined also on the more general class of compact homogeneous differen- 
tiable manifolds M, which includes, next to d-dimensional unit spheres, also other relevant 
examples such as real and complex projective spaces, or Grassmann and Stiefel manifolds. 
The main idea behind this construction is to use tools from harmonic analysis on compact Lie 
groups that allow to build a localized frame on the eigenfunctions of a second order elliptic 
differential Laplace operator on M. which in the case of the sphere coincides with the con- 
where these eigenfunctions are precisely the spherical 



2006 



struction of lNarcowich et al 
harmonics. 

The second goal of this article is to prove non-asymptotic concentration inequalities for 
the uniform fluctuations 

sup \f n (j,y) ~ Ef n (j,y)\ 

yeM 

of needlet estimators f n (j) around the needlet projections Ef n (j) = Aj(f) of the unknown 
density /. The constants in these concentration inequalities depend in a natural way on the 
manifold and we derive reasonably tight constants for the case M = S d ,d > 1. We present 
both Bernstein-type bounds and inequalities bas e d on Rademacher-symmetrization in a sim- 
ilar vein as in recent work in iKoltchinskiil [20061 ] . iGine and Nickll [2010bl |. Emmici and Nickl 



2010l | . 



The third goal is to use the above concentration inequalities to construct estimators and 
confidence bands for the unknown density / : M — > R. Even the problem of spherical 
confidence bands seems not to have been addressed in the literature s o far - one reason may 
arise from the fact that the classical approach in the univariate case (jBickel and Rosenblatt 



19731 ]) via extreme value theory does not straightforwardly generalise to sample spaces with 
a different geometric structure. Our concentration inequalities hold on arbitrary compact 
homogeneous manifolds and can be used directly to construct estimators and nonasymptotic 
confidence bands for the unknown density / if one has apriori control of the approximation 
error of / by its needlet projection Aj(f) (the 'bias' of estimation), which by results in 



Geller and Pesenson 2010[ ] is equivalent to classical Holderian smoothness conditions for / 



on M. 

Since knowledge of the bias is usually not available, the question of how to choose j 
comes into sight, and to which extent ad aptive estim ators and confidence bands can be con- 
structed. It is known on the one hand (jLowl [l997j p that adaptive and honest confidence 
bands in nonparametric function estimation problems cannot exist over the entirety of the 
usual smoothness classes (in our case, Holder-balls on M). Recent work in this field, how- 
ever, can be interpreted as a new way of looking at this problem: One can devise statisti- 
cally relevant subsets of the usual smoothness function classes for which adaptive confidence 
ban ds do exist. One example comes from shape constrained nonparametric regression, see, 
e.g., iDiimbgenl 120031. Other examples are 'self-similar functions' that attain their Holder 



exponent - see IPicard and Triboulevl [2000] in the case of the Gaussian white noise model 
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and regression frame work and I Gine and Nickll 2010al ] in density estimatio n on the real line- 
Moreo v er, building on Ijaffardl E oOOl's work on the Frisch-Parisi conjecture (jFrisch and Parisil 



1983]), lOine and Nick] |2010al ] proved that ' generic' subsets (in the Baire-sense) of the class 



of Holder balls can be constructed for which asymptotically honest adaptive confidence bands 
exist. 

In t he present paper we follow the line of Picard and Triboulev 200d ] and Gine and Nickl 



2010al ]. but take a nonasympt otic approach. We propose an adaptive procedure j n based on 
Lepski's method (jLepskhl 19911 ]) to choose the resolution level j for the needlet estimator f n (j) 
in a data-driven way. The resulting estimator f n (jn) adapts to the unknown smoothness of 
/ in sup-norm risk. In our main result we devise an analytic condition on the approximation 
errors of / by its needlet projections Aj(f) under which we can establish both an asymptotic 
and a nonasymptotic coverage result for confidence bands for / over arbitrary subsets f2 of M 
that are centered at f n (jn), and we show that this ban d adapts to the unknow n smoothness 
of / in the minimax sense. Intuitively the results in Gine and Nickll 2010a ] suggest that 
adaptation is possible for functions / : M — > K that attain their Holder exponent, and indeed 
we prove that our analytic condition can be interpreted in terms of classical Holder regularity 
properties of /. The proof of this result is somewhat delicate and we detail it only in the case 
§i d , where the representation of the projector onto spherical harmonics in terms of Gegenbauer 
polynomials allows for explicit derivations. 

Let us finally remark that even in the univariate case S 1 our nonasymptotic approach to 
confidence bands gives an alternative to the more classic al asymptotic techniques bas ed on 
extreme value theory, as initiated in t he classical pap er iBickel and BosenblattJ [l97$\ . and 
as also used in the adaptive context in I Gine and Nickll 2010al ] . Not surprisingly the results 
obtained via a nonasymptotic approach have limitations, but in contrast to the classical 
asymptotic theory referred to above, the present results give precise conditions for what is 
necessary to obtain coverage in finite samples. 



2 Compact Homogeneous Manifolds and Needlets 



We summarize here s ome facts on compact hom ogeneous manifolds and Lie groups (see 
Helgason 1978 . 2000| . Warner 1983 ]. Faraut 20081 ] for general refe rences), and the construc - 



tion and essential pr o pertie s of the associated needlet frame due to Geller and Mavehl [20091 
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Geller and Pesensonl [20101 ]. generalising the spherical case considered in 
2006al] . 



Narcowich et al 



2.1 Compact Lie Groups and the Laplace Operator 

Let M be a compact connected differentiable (C°°-) manifold of dimension dim(M) = d. A 
compact Lie group G of dimension r is said to act on M via 

(g, x) £ G x M ^ g.x £ M 

if a) this action is, for every g 6 G, a diffeomorphism of M, if b) gi§2-x = gi.(g%.x) holds 
for every g±,g2 £ G, x £ M, if c) the identity e £ G satisfies e.x = x and if d) for every 
g £ G, g e, there exists a point x £ M such that g.x ^ x. A group G acts transitively on 
M if in addition 

for every x, y £ M there exists g £ G s.t. g.x = y. 
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A compact manifold M is said to be homogeneous if it is a compact connected differentiable 
manifold on which a compact Lie group acts transitively. Examples include the d-dimensional 
unit sphere S d of R rf+ 1 , projective spaces, Stiefel and Grassmann manifolds, see p. 125 in 
Warner |l983d and also IWanei [l952j | for the two-point homogeneous case. 



Any compact homogeneous manifold M can be realised as a quotient G/K where K is 
a closed subgroup of G. More precisely, if we fix once and for all a point xq G M, and let 
K = {h G G, h.xo = xo} be the closed isotropy subgroup at Xo, then M is diffeomorphic to 
G/K and the canonical projection 7r i_gEG±±_ g = {gh, h G K} G G/K is continuous, onto 
and verifies ir{gig 2 ) = 91^(92), see Warner 19831 ] . p. 123 onwards. Moreover the image of the 
Haar measure on G under tt, 



f(n(g))dg = / f(x)dx = / f(x)dx, 

G JG/K JM 

is a natural "Haar" measure dx on M, invariant under the action of G. (It is the unique 
G— invariant measure on M up to a scaling factor.) The usual Lebesgue spaces on M are 
denoted by L P (M) := L P (M, dx), 1 < p < 00. Since G is compact, dx is bi-invariant: for 
/ G L X (M) and g G G let us define L g (f)(x) = f{g- l x),R g {f){x) = f[xg), then 

/ L g (f)(x)dx= [ f(x)dx= [ R g (f)(x)dx. 
Jm Jm Jm 

The Lie algebra Lie{G) of G is characterized by the fact that 

X G Lie{G) ^ e x G G, 

and since G is compact, this mapping is onto. Let us recall that we have the Ad representation 
of G in Lie(G) : 

geG^ Ad{g)X = gXg- 1 G Lie(G), and ge x g~ l = e M ^ x , 

and there exists an Euclidean structure (•, •) on Lie(G) for which Ad is unitary, that is, such 
that 

V36G, yX G Lie(G), (Ad(g)X,Ad(g)Y) = (X,Y), \X\ 2 = (X,X), (1) 



see Proposition 6.1.1 in iFarautl [200£ 

Every X G Lie(G) generates a vector field on G so that we can define a one parameter 
group 

t ^ e tx G G, t G M, 

and since G is connected we can define a metric on G by the 'length' \X\ of the 'shortest 
geodesic' joining two points (71,(72 G G, 

d G ( gi ,g 2 ) = inf{|X| 5 e x 9l = g 2 } = inf{|X|, 9l e x = g 2 }. (2) 

The two previous definitions are equivalent, as : 

e X gi = 92 <=> 9i9i 1 e X g 1 = g 2 <=> 9l e Ad ^ x = g 2 , \Ad(g^)X\ = \X\ 

and it is not difficult to verify that this metric is bi-invariant : 

V9l,92,9€ G, d G (gi,g 2 ) = d G (gg 1 ,gg 2 ) = d G (gig,g 2 g). 
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Every X G Lie(G) also naturally generates a one parameter group on M : 

tel^ e tx .x g M 

which describes geodesies of the Riemannian structure on M associated to the Euclidean 
structure (-, •) on Lie(G). The metric on M is given by 

dw(x,y) = inf{|X|, e x .x = y} = d G/H (x,y) = mf{d G (gi, gz), ^(gi) = x,Tr(g 2 ) = y} 

So dM(n{g),ir(g')) < d G (g,g'). Moreover 

V96G, x,y G M, d M (g-x,g.y) = d M (x,y). 

This is again due to (P) as 

e x .x = y<^g.e x g- 1 .g.x = g.y ^ e Ad ^ x g.x = g.y, and \X\ = \Ad{g)X\. 

Now similarly every X G Lie(G) gives rise to a one-parameter group on L P (M), 1 < p < 00, 
given by 

/ 1 y T t (f)(x) = f(e tx .x); t£K,x£ M, / G IS(M) 
and we denote the infinitesimal generator of this one-parameter group by Dx, so 

Dxf(x) = j t f(e tx .x)\ t=0 , x G M, 

the derivative of / at x in the direction of the X-geodesic. 

If Xi, i = 1, . . . , r, is an orthonormal basis of Lie{G) with respect to the scalar product 
induced by the adjoint representation, the sum 

i=l 

defines the Casimir operator, which is independent of the choice of the basis, and which is 
a central element of the enveloping algebra of Lie(G). Associated to the Casimir operator is 
the following operator on L 2 (M) (we keep the same notation C) 

c = d Xi +d 2 X2 + ... + d 2 Xt . 

The operator — £, which is often called the Laplace operator, is a second order, positive, 
elliptic differential operator defined on the space C°°(M) of infinitely differentiable functions 
on M. In fact — C can be closed to give a positive, self-adjoint second order elliptic differential 
operator on L 2 (M) with a discrete spectrum of eigenvalues A^, k G N, arranged in increasing 
and divergent order. By the spectral theorem the corresponding eigenfunctions {ek\k&$ con " 
stitute an orthonormal basis of L 2 (M), and we define, for n G N, the closed finite-dimensional 
subspaces E n = E n (M.) of L 2 (M) spanned by eigenfunctions of C whose corresponding 
eigenvalues do not exceed n, formally 

E n (M.) := < x 1— > c k&k{x) '■ Cfc G R, A& an eigenvalue of > . 

I k:\ k <n I 
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2.2 Connection to the Laplace-Beltrami Operator 

The operator C need not necessarily coincide with the Laplace-Beltrami operator on M, but 
it does in several important cases. If M is a two-point homogeneous space then C equals, 
up t o a scalin g cons tant, the Laplace- Beltrami operator, see Proposition 4.11 in Chapter 
II of Helgason 200d ]. By Wane 1952] 's classification of such spaces this includes, among 
others, the d-dimensional unit sphere, real and certain complex projective spaces. Further 
ex amples for manifolds wher e the Laplace-Beltrami operator coincides with — C are given 



m 



Geller and Pesenson 2O10l ] . Since this connection is of some interest in applications, we 



discuss this point here in some more detail. 

The Laplace operator C is left- and right invariant and symmetr ic with respe ct to the 
inner product (•, •) induced by the adjoint representation, see p. 162 in Farautl 20081 ] . By the 
general theory of irreducible unitary repre sentation of compact Lie groups (e.g., Theorem 
6.4.1 and Proposition 8.2.1 in Faraut [2008 



L 2 (M) = Vj, Vj = ker(C - Cj I) 
j 

for constants Cj, and Mg £ G, L g (Vj) C Vj, 

g £ G (->■ L g £ Lin(Vj) 

is a finite dimensional unitary irreducible representation of G, where Lin{Vj) denotes the 
space of bounded linear operators on Vj. 

Moreover, as a Riemannian manifold, M is equipped with a Laplace-Beltrami operator A 
which commutes with the G— action: \/g G G, AL 9 = L g A. If M is compact : 

L 2 (M) = 0?4, H k = ker(A - \ k I). 



Moreover 7i k is G— invariant (V<? £ G, L g (J-Lk) C Hk), so 

g £ G i-» L g £ Lin(U k ) 

is a finite dimensional unitary representation of G. 

Clearly, if y) is the kernel of the projection operator onto H k , then <fik(y) = &k(xo, v) 
verifies \\4>k\\2 = ^fc^o) = dimC Hk) and is moreover a zonal function (recall that / is zonal 
if V7i £ K, Lh{f) = f, see, e.g., Gine 1975 ], Helgasor] 2000]). If the space of zonal funtions 
in T~L k is of dimension 1 then g £ G i— >■ L g £ Lin(T-L k ) is an irreducible representation. If this 
is the case for all T~L k then C and the Laplace-Beltrami will coincide, if we can check that the 
eigenvalues are the same. 

Let us illustrate this in the case of M = S d , where 

G = SO{d +1) = {A£ M(d + 1 x d + 1), A' 1 = A 1 }, 
Lie{G) = so(d + 1) = {X £ M(d + 1 x d + 1), -X = X 1 } 



and we can take 



(X,Y) = -Tr(XY t ) 
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An orthonormal basis is then given by 

Xij = Eij - Ejj, 1 < i < j < d + 1, Ej ti = ((xjf ; i)k,l, "fc'f; = 8i,k5j,l- 

We take x = (1, 0, . . . , 0) so K » 50(d) and 

Vx,?/ G M = § d , d S d(x,y) = arccos((x, y) R d=i) 

The eigenvalues of A are X k = —k(k + d — 1) ( Farautl 2008]), the space equals the 
space of spherical harmonic functions of degree k, and there is only one zonal function in 
each T~Lk (which is given through Gegenbauer polynomials) so the induced representation are 
irreducible (and not equivalent). To see that A = — £ it is enough to compute the eigenvalue 
of C on Tlk and this can be carried on in the case of the sphere using the explicit expression 

n 2 



2.3 A Smoothed Projection onto the Span of the Eigenfunctions of — C 

We shall write (g,h) from now on for the standard L 2 (M)-inner product of two functions 
g,h G L 2 (M) := L 2 (M, dx). We also denote by ||<7||n = sup yg Q \g{y)\ the supremum norm of 
g : M — > R over Q C M, and we shall write ||<?||oo when = M. 

Let < a < 1 be an infinitely differentiable nonnegative function defined on [0, oo). We 
require a to be identically 1 on [0, 1/2] and compactly supported on [0, 1]. Define the sequence 
of linear operators Aj, j > 0, with 



A f 



M 



f(x)dx, Ajf(x) := Aj(f)(x) = / A j (x,y)f(y)dy, j > 0, 

Jm 



where, for L k (x,y) = e k (x)e k (y), 



Aj(x,y) := ( ^| j L k (x,y) = ^ 

h \ ' h..\ ^n- 



X k 



a ( %2] I e k{x)e k {y). 



k:\ k <2 2 i 

Clearly 

(AjfJ) =J2 a (w) {LkfJ) ^ ll/l12 ' ^ fh ^ 

k 

from Parseval's identity and since \a\ < 1. Since a is identically one on [0, 1/2] 

h G E 2 2 3 -i(M.) implies Aj(h) = h (3) 
and since E n (M) , n > 1 , is dense in 1? (M) we conclude 

lim p i /-/|| 2 = 

for every / G L 2 (M). Thus Aj furnishes us with an approximation of the identity operator 
on L 2 (M). 

The kernel A can be 'split' as follows: If we define 



Cj(x,y)= Yl \ a (-^j) L k( x ,y) 

k:X k <2 2 3 V ^ ' 

then due to the orthogonality properties of the L k s we see 



A j(x,y)= Cj(x,u)Cj(u,y)du. (4) 

JM 
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2.4 Gauss Cubature Formula and Needlets on a Manifold 



The following quadrature formula holds on E^M), see Theorem 5.3 in lGeller and Pesenson 



2010 ]. For every k S N there exists a finite subset Xk of M of cardinality \xk\ < Ck d / 2 and 



positive real numbers b v := > 0, indexed by the elements r\ of Xk, such that 

V/ E E k (M), f f(x) dx=Y] b v f(r,) . 



(5) 



The kernel Cj defined above clearly satisfi es z t-t Cj(x,z) £ E 2 2j(M.) for every x G M, and 
Theorem 6.1 in Geller and Pesenson 20ld | states that 



f,g £ E n {M) ^ fg E E 4rn (M), 



(6) 



so we deduce z h-> Cj(x, z)Cj(z,y) £ E t2 2 3 +2 (M). Note that it is property © where homo- 
geneity of the manifold is used crucially. It is in the same spirit as (but not equivalent to) 
the addition formu l a for eigenfunctions of the Laplace-Beltrami operator on a Riemannian 
manifold (see iGinel 19751 ] ). Combining (jSJ) with (J5J thus implies 



Aj(x,y) 



/ Cj(x,z)Cj(z,y)dz = br,Cj{x,i])Cj(i],y) 



V&X T2 2j+2 



and the action of Aj on L 2 (M) can hence be represented as 



Affix) 



Aj(x,y)f(y)dy 



M 



M 



h rf j {x,r])C j {ri,y)f(y)dy 



V&X T2 2j+2 



b v Cj(x,rj) 



M 



b v Cj(r],y)f(y)dy. 



This motivates the definition of the needlet scaling function (pj v indexed by the cubature 
points rj £ Zj, 

<t>j V (x) := s/b^Cj(x,T]); rjeZj = x r2 2i+2. 



With this notation we can write 



A jf( x ) = ^2 (tin, f)<f>. 



3V\ 



(7) 



and call this approximation the needlet projection of / onto E r2 2j+2 (M) at resolution level j. 
We shall need below the following estimates on the cubature set, see lGeller and Pesenson 



20101 ] 



(8) 



for some explicit constants k±,k 2 > 0. 

Although we shall not explicitly use it in what follows, we can telescope the needlet 
projections in the usual way to obtain a wavelet-type multiresolution approximation 

Ajf = A f + 52^^iv 

0<i<j-l V 
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of a function / on a compact homogeneous manifold by needlets 



with c(y) = \J a(y/2) — a(y). See Section 8 of iGeller and Pesensonl [20101 ] for details. In 
particular 



/ G L 2 (M) : 
and the (ipj^ys form a tight frame of L 2 (M): 



l<j r,(LZi 



as j — > oo, 



V/gL 2 (M), ||/||I = K/.WP 



(9) 



] v 



2.5 Properties of the Needlet Frame 

We establish some key properties of needlets, including their near-exponential localization 
property. 

Proposition 1 We have, for some constant < -Di(M) < oo and every j > 0,7/ G 2j, 

||<M 2 <1, Halloo <£i(M)2^/ 2 . (10) 
Moreover, for every x G M,r/ G and every iV G N i/iere exists a constant cn such that 



u ( CN2jd/2 

Proof. For the first inequality in ()10p . let rj G M, n G N and note 

2 



(11) 



/ X] L k(x,7])\ dx= ^ L k(V,V)- 

JM \k:X k <n J k:X k <n 



On the other hand 



x 



^ L k (x,7]) G E 4rn (M), 



. k:X k <n 



so if Xim is the set of cubature points of i?4 rn (M) and rj G Xir 

2 / \ 2 



[ Yl L ki^V)) dx= J2 hi J2 L ^>V)) >bj Yl L kiv,v)) 

JM \k:X k <n j tEXA-rn \k:X k <n j \k:X k <n j 



so, combining these estimates, 
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for every rj G XAdn- This implies, for every r\ £ Zj, 

I (p 2 jri (x)dx = b v ^2 a (^k/ 22j ) L k(ji>v) < 1. 

To prove the remaining claims, recall that by definition 



<Pjv( x ) = \l\ ^2 ya{h/2 2j )L k (x,y). 

k:\ k <2 2 i 

For / a function f rom t he Schwartz-class on R + , Lemma 4.1 (and the remark after it) in 
Geller and Maveli |2009l |. applied to the elliptic operator f(C/2 2 i) (notation of functional 
calculus, t = 2~ 2 i in their lemma), proves that for every integer N > there exists a constant 
c/v(/) such that 

V f{X k /2 2j )L k (x,7]) < CN( /l 2J (12) 



Applying this to / = \/o, we infer the second bound in (|10p and (jlip follows from ([8]) and 
(HJ). ■ 

Proposition 2 We have 

sup [ A 2 (x,y)dy<D 2 (M)2i d , sup \Aj(x,y)\ < D 2 (M)2 jd (13) 
x-eM Jm i,seM 

/or some finite positive constant D2CM.) that depends only on the manifold. 

Proof. As Aj(x,y) := ^2 k a(\ k /2 2: >)L k (x,y), the second claim follows from (fT2j) with f = a. 
For the first 



J^A 2 (x,y)dy = j M Yl a (w) Lk ^ x ' y "> a (w) Ll ^ x,y ^ dy 



t L k (x,x) 



and again using (fT2"j) with f = a 2 gives the result. ■ 
2.6 The case of S ' 

In the cas e of the o!-dimensional u nit sphere S d of M. d+1 the above construction is effectively 
the one in Narcowich et al. I |2006eJ ] . On §> d the differential operator C coincides with the usual 
Laplace-Beltrami operator, and we have 

L 2 (^ d ) = 0?4, n k = H k (S d ) = ker(A - X k I), \ k = -k(k + d-l). 



The eigenfunctions e k in t his case are th e spherical harmonics with eigenvalues k(k + d — 1) 
(e.g., Proposition 9.3.5 in Faraut 20081 ]). Thus if we take the subsequence N = N k of N 
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for which k(k + d — 1) = iV& as A; runs through the nonnegative integers, then the spaces 
E^{^ d ) correspond to the spaces Vn^) of spherical polynomials of degree less than or equal 
to N, which ar e spanned by the mutually orthogona l spaces ?4(§ d ),0 <k<n,of spherical 
harmonics, see iFarautl (20081 ] . IStein and Weissl |l97l| . 

If {e^} is any orthonormal basis of % k , then we write, in slight abuse of notation, 



d+l 



L k(x,y) = ^ef(x)e J fc (y) = L k ((x,y) d+1 ), (x,y) d+1 = 



i=i 



\S a \L k (u) = (l + £W(u), v= 6 —^, «G[-1,1] 

where Cj^ is the corresponding Gegenbauer polynomial, and \§> d \ is the Lebesg ue measure of E> d , 
i.e., \§ , d \ = f §d dx = (2^ d+1 ^ 2 )/T((d + l)/2). We have furthermore (p. 144 in lStein and Weissl 
19711 ] ) for every x G S d , 



dim(H k {S d )) 



and thus 

|S rf |L fc (l) = dim(?4(S rf )). (14) 

Moreover, for <j > 2 and any n G N, V n (S d ) = ®J! =0 %fc(8 d ) and as a consequence, by 
Stein and Weissl |l97ll ]. 



dim(^(S d )) = ^ +d -^_ 2+d 



(d + fc-2)!(d + 2fc-l) 
fc!(d- 1)! 



dim(P n (S d )) = + C^ +d _ 1 = |(n - i )(„ + 2). An + <1 - 1 )(„ + - 



— n ( 1 H )(l + - 



n 



n 



1 + 



d-1 



n 



d 

2^ 



n 



2 n tt a i\ 

d fi ^ n I 



J n 



So, for d > 2, n > 2, 



-(n+l) d < dim(P n (S d )) <n° 



71+1 



— n d < dimCPn-!^)) < n d and dim(V n -i{§ 1 )) = n. 
d\ 

By virtue of these bounds the constants in Proposition [2] can be explicitly calculated. To 
obtain a unified notation define, for j G N, the integers k(j) = max{A; G N : X k = k(k+d—l) < 
2 2 i} so that k(j) < 2 3 always holds. Then 



A 2 (x,y)dx 



Y, [a(X k /2 2 ^] 2 L k (l) = -^ Y, Hh/2 2 i)] 2 dim(U k ($ d )) 



< 



k:\ k <2^ 

dim(V k(j) (S d )) ^ V d 



< 



k:\ k <2 2 i 
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and these inequalities imply that the same bound holds for \Aj(x,y)\. We can also deduce, 
as in the proof of Proposition [T] 



2i d 



U jv \\oo = VK, E \/4X k /2^)L k (l)< I Lk ^^\lw\- 

k:\ k <2 2 i y fc:A fe <2 2 J V ' ' 

Conclude that the key constants -Di(M), Z?2(M) in the last subsection can be taken to be 

Dli§d) = ^ D2{sd) = W\ (15) 

in the case of the unit sphere. Finally we should remark that in the case of the unit sphere the 
addition formula © holds with 4rn replaced by 2n as one is multiplying spherical polynomials. 
[Indeed whenever the Laplace-Beltrami operato r coincides with C one can use the addition 
formula for eigenfunctions of the Laplacian in iGinel |l975l ].] Moreover, if d = 2, for each 
resolution level j, the HEALPix pixelisation (commonly used for astrophysical data) gives 
12 • 2 2j cubature points, so A;2 = 12 in ©. 



3 Linear Needlet Density Estimators and Concentration Prop- 
erties of their Uniform Fluctuations 

Let X, Xi, X n be i.i.d. random variables taking values in a compact homogeneous manifold 
M of dimension d. Denote their common law by P and assume that P possesses a density 
/ : M — > [0, oo) w.r.t. dx on M. Denote further by P n = ^ Y17=i ^« the empirical measure 
of the sample. Let Aj(x,y) be the needlet projection kernel. For j £ N, the linear needlet 
density estimator of / is defined as 

1 n I' 
fnU,y) = -y j M x ^y)= / Aj(x,y)dP n (x), y€M. (16) 

We shall often write, in slight abuse of notation, f n (j) for f n (-,j). 

3.1 A Bernstein-type Concentration Inequality for Needlet Estimators 

We define now some quantities that measure the 'Gaussian' and 'Poissonian' fluctuations 
of the uniform deviations of the centered estimator f n (j)- Recall the explicit constants 
Di(M.),\Zj\ < k2^? d from (|8|), (|10p in the previous section. Note moreover that the sec- 
ond estimate in Proposition [1] immediately implies 

2^ 2 c (M, j) = sup V \<f> jv (x)\ < 2^ 2 C(M). (17) 

The constant co(M, j) = co(M, j, a, Zj) (or an upper bound for it) can be computed explicitly 
after the regularizing function a and the quadrature set Zj have been chosen, and a sharp 
numerical evaluation of it is important in application of Proposition [3] below. 
Define then 

Mid 2 ld 

a(n,l,x) := a(x,l)\ \-a'(x, I) — 

V n n 
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where 
and 



a[x 



, I) := a(M, f, x, I) := c (M, Z) V2(log(2|Z,|) + ; 



a'{x,l) := a'(M,x,l) := c (M,Z)-Di(M)(log(2|Z J |) + x). 



We now prove the following concentration inequality for the needlet density estimator. 

Proposition 3 Let M be a compact homogeneous manifold and suppose f : M — > [0, oo) is 
bounded. We have, for every n G N, every j£N and every x > 

Pr \ sup \f n (j,y) ~ Ef n (j,y)\ > a(n,j,x) \ < e~ x . 

[yeM J 

Proof. The explicit cubature formula for eigenfunctions of C allows to reduce the infinite 
supremum sup ygM \f n (j, y) — Ef n (j,y)\ to one over a finite set, so that finite-dimensional 
probabilistic methods can be applied. Indeed, the estimate (fT7|) implies that the supremum 
of any h G E 2 2j-i(M.) over M can be bounded by the (finite) maximum of the needlet 
coefficients of h: Clearly from © 

V/i G E 2 2j-i(M), h(x) = Ajh(x) = V] (cf)j ri ,h)(f) jri (x) 

so that for Zj a cubature set of E r2 2j+2 (M) one has 

sup \h(x)\ < max|(^,/t}| sup V \</>j v (x)\ = 2 jd/2 c (M,j) max\(<p jv , h) \ . (18) 



■qeZj 



Now using (•, •) notation also acting on finite signed measures, 



1 1 /nt?) - Ef n (j) || oo = sup 

j/SM 



^(y)(^'- p ™ - p ) 



< 2^/ 2 c (M, j) max | (0™, P n - P 



rieZj 



\ z .\ 

by (fT7|) above. Consider the finite empirical process indexed by the class of functions {4>j-q k } k =i 
which has envelope U = 2 jd / 2 D 1 (M) in view of (|10p . The class of functions 

g:={<t> jm /2U,...,<t>^ Zjl /2u}, 

is thus uniformly bounded by 1/2 and its weak variances a 2 satisfy 

sup Eg 2 (X) < a 2 



Z> d + 2 D 2 (M) 



since 



0j v \\2 



< 



1 (again (flO|) ). Recall Bernstein's inequality (e.g., p. 26 in Massart 20071 ]): If 



Z\, Z n are i.i.d. centered random variables bounded in absolute value by 1 then 



i=l 



2tv t t 
n in 



(19) 
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where v > EZf. Therefore, using the notation \\(i\\g = sup 9g g | J gdfi\ for signed measures /x, 
Pr \\\fn(j)-Ef n (j)\\oo >c (M,j)| 



2(log(2|^|)+x)2^||/|| c 



+ 



2UV d / 2 (log(2\Z J \) + 
3n 



<Pr{max\{(f> jiV ,P n -P)\ > 



2(log(2|^|) + x 



oo | 2[/(log(2|^|) + x) 



3n 



< PW IIP,. -Pile > 



, 2(log(2|Z j |) +x 
D 2 (M)2i d + 2 n 



oo \og{2\Zj\) + x 
3n 



Pr 



max 

m=l,...,|2j| 



- J2(g m (Xi) - Eg m (X)) 



i=l 



> ^; 2(log(2|^-|)+x)a 2 + log(2|%|) + x 



n 



3?i 



13; I 

<E Pr 

m=l 



-V( 9m (i,)-%(i)) 

77, 



1=1 



: w ' 2(log(2|Z i |) + x) t 7 2 + log(2|Z,-|) + x 



377 



< 2|Z,-|exp{-log(|Z,-|) - log 2 -a:)} = e~ x , 

which completes the proof of Proposition [3l ■ 

We should mention that a minor modification of the proof of Propos i tion [3] combined with 
the usual blocking arguments (as, e.g., in Theorem 1 in lGine and Nickll 20091 ]) implies under 
standard conditions on j n (including 2- J " ~ n TI for some < rj < 1) that 



/ 2^ nd j n 

limsup \j sup \f n (j, y) ~ Ef n (y, d)\ < D almost surely 

n 2/eM 



(20) 



where the constant D depends only on M, /c2 and 

In some proofs below we shall need that a(n,l,x) is monotone increasing in I G N. In 
general whether this holds true or not depends on the cubature Zi as well as on the function 
a. Monotonicity of a(n,l,x) can be easily ensured if we replace a(x,l) and a'(x,l) by their 
upper bounds a(x,l),a'(x,l) obtained from the inequalities \Z\\ < ^2^, co(M, /) < C(M). 
While we do not advocate this in practice, for the theoretical development we define 



/ Qld ^Id 

a(n, I, x) = a(x, l)\ \- a'(x, I) — , A(n, I, x) :- 

V 77 77 



a(x,l) +a'(x,l)^2 ld /n 



(21) 



The constant A(n, I, x) allows for a(n, I, x) to be written as a constant multiple of the 'Gaus- 
sian component' y / 2 ld /n, that is, a(n,l,x) = A(n, I, x) \j2 ld jn. 



3.2 Concentration Inequalities via Rademacher Processes on Manifolds 

Despite its conceptual simplicity the approach from the previous section has one drawback: 
the uniform deviations of f n — Ef n are controlled globally on M by the function o~(n,l,x) - 
constant on M. For functions / that exhibit spatially inhomogeneous regularity properties it 
is of interest to have a 'localised' version of a(n,l,x). This could be achieved in Proposition 
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[3] by means of proving a 'local' analogue of (|18|) . which, however, is a rather intricate matter 
that we do not pursue here. Instead we show ho w a simple symmetr ization t echnique can be 
used to deal with this problem. This is inspired by Koltchinskiil 20061 ] and also Gine and Nickl 
2010bl ]. 7 oi ft any subset of M, define a Rademacher process {(1/ra) ^ £jAj(Xj, y)}ygf7 and 



set 



sup 



1 n 

n 

i=l 



with (£i)™ =1 an i.i.d. Rademacher sequence, independent of the Xi's (and defined on a large 
product probability space). R n (Q,j) can be computed in practice by first simulating n 
i.i.d. random signs, applying these signs to the summands Aj(Xi) of the needlet density esti- 
mator, and maximizing the resulting function. The idea is that the supremum i? n (ft, j) of the 
symmetrized process serves as a random surrogate for the unknown supremum sup yg Q \fn(j, y)~ 
Ef n (j,y)\ of the centered process. Indeed Proposition H] shows that sup yg Q \f n (y) — Ef n (y)\ 
concentrates around (a constant multiple of) i? n (ft, j). Define the deviation term 



a R (n,n,j,x) = 6R n (n,j) + W 



2i rf D 2 (M)||/||co0c + log2) , 2^D 2 (M)(2x + 21og2) 



+ 22- 



n 



Proposition 4 Let M be a compact homogeneous manifold and suppose f : M — > [0, oo) is 
bounded. We have for every n £ N, every j £ N, every ft C M and every x > that 



Pv{ S np\f n (y,j)-Ef n ( yi j)\ > a R (n,n,j,x) \ < e~ x . 
[yen J 

Proof. We use the following general result for empirical processes. 

Proposition 5 Let J 7 be a countable class of real-valued measurable functions defined on M, 
uniformly bounded by 1/2. We have for every n G N and x > 



( n n 

Pr -J2(f(Xi)-Pf) >6 -J>/(X, 

K i=l jr i=l 



T 



+ /(x + log^^^x + logj 



n 



n 



The p roof, which is based on Talagrand 199d| 's inequality with const ants (e.g., Massart 
2007y), is inspired by ideas in Koltchinskiil |200fil ]. iGine and Nickll |2010bl ]. and can be found 



in Proposition 5 in lLounici and Nickll [20101 ] . Now to prove Proposition 2] note that 



\\fn(j)-Ef n (j)\\ n = su V 



1 n 

-^(AjiXi^-EAjfry)) 



i=l 



for ft C M. This amounts to studying the empirical process indexed by the class of functions 
{Aj(-,y) : y £ ft} for ft C M. This class has envelope 2^ d D 2 (M) in view of Proposition [5J 
Define thus 

Q := Qj = {A 1 (;y)/(2^ d+1 D 2 (M)) : y € ft} (22) 

which is uniformly bounded by 1/2. [In fact, by continuity of the mapping y \— > Aj(x,y) for 
every x £ M we can restrict ourselves to a countable subset of ft, which we still denote by 
ft.] Furthermore the upper bound for the weak variances can be taken to be 



--: a 



(23) 
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in view of Proposition [2j Then, recalling the notation || • \\g from the proof of Proposition [3] 



Pr<^ \\f n (j,-)-Ef n (j,-)\\ n >6R n (n,j) + W 



2^ 2 (M)||/|| 00 (x + log2) 



+ 22 



2^D 2 (M)(2x + 21og2) 



n 



{ n 



- 2^+ 1 D 2 (M) + 1 



, (x + log 2) 22 x + log2 



J D 2 (M)2^+ 2 n 



n 



> 6 



1 ™ 



i=i 



(x + log2)^ +22 x + log2 



I? 



and the last expression is less than or equal to e x using Proposition [5] with Q as in (|22p and 
a specified by (f2"3"|) . ■ 

It is interesting to compare a R to a from Proposition [3l On the one hand the second and 
third terms defining cr R (f2, n, j, x) are of a smaller asymptotic order than a(n, j, x) for j — > oo 
due to the absence of \Zj\ in a R . On the other hand the term R n (fl,j) is random, and one is 
led to ask whether in average a R will be larger or smaller than a. Our proofs imply, for some 
constant C independent of j, n, that 



ERn(^j) < C 



so that a R has the same size as a as a function of j, n, up to constants. 

Inspection of the p roofs and arguments similar to those in the proof of Proposition 2 in 
Gine and Nickll 2010b] show that R n (Q,j) in Proposition 2] can be replaced by its (condi- 
tional) expectation E £ R n (Q, j) - a quantity that may be more stable in applications. More- 
over, the constants appearing in t he definition of a R may still be fairly conservative: the 
proof is based on an application of iTalagrandl 19961 ] 's inequality with explicit constants (see 
Massartl 20071 ] ), and in the lower deviation version thereof the optimal constants are not 
known yet. 



4 Confidence Bands 

If the size of the bias \\Ef n (j) — /Hoc were known, one could directly use Propositions or U] 
and a suitable choice of j to obtain confidence bands with prescribed finite sample coverage. 
For instance, if / is the uniform distribution (volume element) on M, the bias Aq(/) — f of 
the estimate f n (0) is exactly zero. In analogy, if / G E n (M.) is a finite linear combination 
of eigenfunctions of C (so in the spherical case a polynomial) then the estimator f n (J) for 
sufficiently large but finite J also has bias zero (cf. ([3])). As usual, going beyond finite- 
dimensional smoothness classes is possible by considering spaces of differentiable functions 
on M. For instance one defines C fc (M) as the set of continuous functions / G C(M) such 
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that for all X±, X2, • • • , X^ in Lie(G), Dx±Dx 2 ■ ■ ■ Dx k f G C(M). It is a Banach space when 
equipped with the following norm: 



c k = SU P \\D Xl Dx 2 ...Dx k f \\oo + 

\Xi\<l,...,\X k \<l 

and C°°(M) is the intersection of all the spaces C k (NL), k G N. One can define such spaces also 
for noninteger by introducing a modulus of continuity along vectorial directions X, and the 
resulting scale of Holder-Zygmund function spaces C fc (M) can be characterized by the decay 
of their needlet coefficients in very much the same way as in the case of Holder-Zygmund 
spaces on Euclidean spaces: A /c-regular function in C fc (M), k > then satisfies the estimate 

\\A j (f)-f\\ 00 <C2-* (24) 



See Geller and Pesenson 2O10l ] for these results. If the smoothness degree t of / is known 



such bounds can be used, together with Propositions El HI in the cons truction of asymptotic 



confid ence sets, proceeding in the same way as in the classical paper iBickel and Rosenblatt 



19731 ] via choosing a resolution level j n that leads to 'undersmoothing', i.e., a bias of smaller 
order as a function of n than the random fluctuations of the centered estimators. 

However, in the typical nonpar ametric function estimation problem the size of the bias is 
not known, and the above assumptions are far from realistic. So we have the more ambitious 
goal to obtain confidence sets for the needlet estimator with an automatic choice of the 
resolution level j. 

4.1 Estimate of the Resolution Level 

Split the sample into two parts S\ and 1S2 , each of (integer) size n\ > and n-z > respectively. 
For asymptotic considerations we shall require that n\jni is bounded away from zero and 
infinity as n — > 00. Denote by 

- ni n 2 

P ni=—Yl §X i ' aTld Pn 2=—Yl 
i=l i=l 

the empirical measures associated with the first and the second subsample, respectively, and 
define the associated needlet density estimators 

fn v (j,y)= A j (x,y)dP nv (x), y G M, v = 1,2. 

JM 

We use the sample 1S2 to choose the resolution level j. For 77-2 > 1, choose an integer 
Jmax := Jmax,n and define the grid of candidate bandwidths as 

J:= Jn = {[0,Jmax]nN}. 

For asymptotic considerations we shall only require 

2 ; max ^ ( n 2 \ 1 (25) 

V(log n 2 ) 2 J 



but a practical choice is to first choose I* such that a(x, l*)wW/n2 = a'(x, l*){2 1 * /n2) and 
to define j max such that 2 Jmax = 2 z */(log?i2) 1 /' i . Such a choice of j max is just slightly below 
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the boundary where the Poissonian term starts to dominate the Gaussian term in a(ri2,l,x) 
in Proposition [31 and choosing j > j max would then result in inconsistent estimators, so that 
Jmax is a natural upper bound for J . 

The goal is to select a data-driven bandwidth j n from J n . Heuristically, for I > j, 

fn 2 (j) ~ fn 2 (l) = [fn 2 (j) ~ Efn 2 (j)] ~ [/n 2 (0 " Ef m (l)] + [Aj(f) - f] - [A(f) - f] 

and with large probability the first two terms should not exceed 2a(ri2,l,x), a quantity that 
increases in I, and we would like to choose j n to be the smallest j such that the approximation 
error 2(Aj(f) — /) (which decreases in j) does not exceed the size 2a(ri2,l, x) of the random 
fluctuations. 



We shall use the subsample 52 to select j n following this idea, which is due to iLepskii 



19911 ] . formalised as follows: 

j n = mini j ej :\\f n . 2 (j)- f n2 (l)\\ n <4a(n,l) Vl>j,lej}. (20) 



where a(n,l) = cr(ri2, I, ftlog 712), cf. (|2ip. where k > is any numerical constant (see Remark [3] 
for discussion). By definition j n = j max if Vj, 31 > j, l,j £ J, \\f n2 U) ~ /n 2 (0lln > 4a(n,/). 

A few remarks about the constants involved in the definition of <r(n, I) are in order: All 
these constants are explicit once the function a and the cubature Zj have been chosen, except 
for the quantity ||/||oo- If no upper bound for ||/||oo is known we advocate that ||/||oo be 
replaced by ||/n(jmax)||oo- Standard argume nts imply that this rand om quantity exponentially 
concentrates around ||/||oo) see for instance Gine and Nickl 2010b| . Consequently we neglect 



the case of ||/||oo unknown in what follows in order to reduce technicalities. Moreover we shall 
see below how the choice of the numerical constant k influences the finite-sample performance, 
but our results hold for any choice k > 0, in particular it does not have to be 'large enough' 
(as is often assumed in the adaptive estimation literature). 

4.2 Confidence Bands with Random Sizes 

To construct the center of the corridor of the confidence band over Q C M we evaluate the 
linear estimator /m(-,y) from (|16p at the random bandwidth j n . It turns out that some 
undersmoothing is useful - in fact crucial - so let u n be a sequence of natural numbers and 
define 

fn(y) = f ni Un +un,y), y e V- 

We shall see below how the sequence u n influences our results but heuristically, and for 
asymptotic considerations, one may think of u n of the order log log n. 

The confidence band we propose is centered at f n (y),y £ ^> and has random size 

s n {x) = 1.01cr(ni, j n + u n ,x), 

cf. ([2"T]) . more precisely 

C n :=C n (x,y) = f n {y)-s n {x)J n {y) + s n {x) , x > 0, y 6 O C M. (27) 

Alternatively one can use the band size s^(Q,x) = 1.01a R (Q,ni,j n +u n ,x), and all results 
proved below g o through by virtue of P roposition H] and using techniques from Rademacher 
processes (as in I Gine and Nickll |2010b^ . but we abstain from this to reduce technicalities. 
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4.3 Coverage and Adaptation Properties of C n 

4.3.1 Coverage over Eigenspaces of £ the Finite Dimensional Case 

We first consider here the important case where / is a very smooth function, that is, a fixed 
linear combination of eigenfunctions of £, so / G E 2 j _i(M.) for some fixed J. For simplicity of 
exposition let us consider the case of global confidence bands f2 = M only in this subsection. 
We start with the case where / equals the volume element of M. 

Theorem 1 If f is the volume element o/M, J M f(x)dx = 1, then we have, for every n G N 

Pr(j„ = 0) > 1 - 2j m&x n 2 2K . 
Furthermore, for every n G N and every x > we have 

Pr {f{y) G C n (x, y) for every y G M} > 1 — e~ x (28) 
and, if2 Und /n — » asn->oo then s n (x) = Op r (2 Und / 2 / y/n) . 

In other words our automatic band C n attains exact finite sample coverage if / is uniformly 
distributed, and in the usual situation where u n = log log n the size of the band shrinks almost 
at the parametric rate \j\J~n. 

It is instructive to consider next the case where / G E 2 2j-\ (M) \ £"22,7-2 (M) for some 
fixed J G N. We would then hope that j n = J with large probability, as then Aj(f) — f = 
(see ([3]) above). In the following theorem we restrict ourselves to asymptotic considerations 
to highlight the main ideas. 

Theorem 2 Suppose f G £ 2 2./-i(M) \ E 2 2j-2(M.) for some fixed J G N. We then have that 

Pr( Jn ^[J-l,J])=0(n- 2K + e- c ") 
as n — > oo for some constant c that depends on f only through ||/||oo an d through 

bx(f)= inf b-/|U>0. 

pe£ 2 2j- 2 (M) 

Moreover if u n > 1 Vn G N then 

Pr {f(y) G C n (x, y) for every y G M} = 1 — e~ x — 0{e~ cn ) (29) 
and if2 Und /n — > as n ->■ oo i/ien s ra (x) = Op r (2 Und / 2 /v / n) . 

Thus the confidence band C n has asymptotic coverage for any fixed spherical polynomial, 
and the asymptotic size of the band C n is of order l/\/n up to the undersmoothing factor. 

Clearly we have neglected the question of honesty of C n , that is we have not addressed 
the question whether (j29|) holds uniformly in / G Uo<j<j-iE 2 j ' (M). Inspection of the proof 
implies that C n is honest over linear combinations of eigenfunctions of C for which the sepa- 
ration constants &i(/) are bounded below by a constant multiple of 1/y/n. That uniformity 
over all densities between E 2 2j-i and E 2 2j-2 cannot be attained for our 'adaptive' procedure 
is related t o impossibility results fo r post-model selection estimators in finite-dimensional 
models, see Leeb and Potscher 20061 ] . 
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4.3.2 Asymptotic Coverage over Holder Balls 

Theorem [2] just resembles the finite dimensional situation, and if it were indeed known apriori 
that / € E 2 j-i(M.) for a fixed J one could simply use f n (J) as an estimator, circumventing 
the uniformity problems raised in the previous subsection. However if no finite-dimensional 
model seems realistic for / we may accept these uniformity problems for which is not 

well-behaved if in return our procedure performs well in the infinite-dimensional setting. Note 
that in the usual infinite-dimensional nonparametric models the default estimator / n (jmax) 
has only a logarithmic rate of convergence to zero in supremum norm risk, and will lead to 
unneccesarily large confidence bands. In contrast our confidence band C n adapts over an 
infinite-dimensional class of Holder continuous densities / as we show in this section. 

Our first main result is that the size of the band C n equals, with large probability, the 
optimal band size that one would obtain from balancing approximation error Aj(f) — f 
and random fluctuations f n (j) — Aj(f). For asymptotic considerations this will imply that 
our band shrinks at the optimal rate of convergence depending on the regularity of /. To 
formalize this statement we shall impose a regularity condition on the density /, namely 
that its approximation errors ||Aj(/) — /||n are bounded by a constant multiple of 2~~ jt for 
some unknown t > 0. As mentioned in (|24p above this is tantamount to assuming a classical 
i-H61der condition on /. The theoretical bandwidth that balances bias and variance is then, 
up to additive constants (see (f38j) below for an exact definition) 

Jn (*) = ^ ( lo S2 n ~ l °S2 log n) . 

Theorem 3 (Size of the band) Let f2 be any subset o/M. Suppose f : M — > [0, oo) is 

bounded and that \\Aj(f) — < b 2 2~ :)t for some b 2 > and some t > 0. Let 2s n {x) be the 
diameter of the band C n (x, y). Then, for every n G N, x > 0, 

Pr{s n (x) > 1.01a(m, j*(t) +u n + l,x)} < 2(j max - f n {t))n 2 K . 

In particular, if the under smoothing constants u n are such that 

t 

, , /logn\ 2t + d u n d 
rn(t) := 2— = o(l) 

as n — )• oo then s n (x) = Op T (r n (t)). 

Note that t he proof of the theorem , combined with standard arguments from adaptive 
estimation (e.g., Gine and Nickl 2010b( ]). implies as well that f n is rate-adaptive in sup-norm 



loss, that is, for every t > 0, 

sup E sup \f n (x) - f(x)\ = 0(r n (t)). (30) 

f-\\Aj(f)~f\\ M <b 2 2-^ xeM 

The rate of con yergence r„,(t) c annot be improved over classes of functions that are i-H61der, 
see for instance Klemela 19991 ] in the case M = E d , and since these Holder classes are, up 



to constants, sets of the form {/ : \\Ajf — f\\oo < b 2 2 jt } for suitable b 2 (see the results 



m 



Geller and Pesenson 201Cl( |). this implies that (|30p is optimal, and that the band C, 



m 



Theorem [3] shrinks at the optimal rate in a minimax sense (up to the undersmoothing factor, 
which will typically be of size y4ogn). 



20 



Clearly without a sharp evaluation of the probab ility of the event {/ G C n } Theorem 
is useless for statistical inference. It is known (see Low 1997i |) that adaptive confidence 



bands for densities on R cannot have coverage over a continuous scale Ut>o ^(*> ^2) OI " Holder 
balls S(t, 62)- In a way Low's results can be seen as an infinite-dimensional analogue of 
the pathologies in fini te dimensions mentioned above. On the other hand recent results in 
Gine and Nickll 2010a ] show that adaptation is possible over 'generic' subsets of Ui>o ^ 2 ) 



when densities are estimated on the real line. The idea is that even if some pathologies cannot 
be avoided there are still exhaustive classes of densities for which adaptation is possible, and 
we show how this applies to density estimation on M. 

To this end we assume the following crucial approximation condition. While the upper 
bound is standard, the quantity occurring in the lower bound can be viewed as an infinite- 
dimensional analogue to the constant 61 that appeared in Theorem [2) Note that whereas b\ 
is always positive the lower bound in the following condition may fail to hold for any t for a 
given continuous function /, at least for large enough j. We discuss this in Section! 



Condition 1 Assume that f : M — >• [0, 00) is bounded and let t, 62 > be real numbers. 
Suppose that there exists a sequence b(n) such that < b(n) < 62 for every n G N and such 
that f satisfies, for every j G J n , the inequalities 

b(n)2-J t <\\A j (f)-f\\ n <b 2 2-j t . (31) 

Under this condition we can prove asymptotic coverage of our nonparametric confidence 
band. We should note that inspection of the proof reveals that this coverage result is 'honest': 
it holds uniformly over classes of densities satisfying Condition [TJ 

Theorem 4 (Asymptotic Coverage) Let f2 be any subset o/M. Suppose f satisfies Con- 
dition^ and that the undersmoothing sequence u n G N is such that u n + % log 2 (fo(n)) — >• 00 as 
n — > 00. Then we have, for every x > 0, 

liminf Pr {/(y) G C n (x,y) for every y G 12} > 1 — e~ x . (32) 

n 

For instance if one knows that lim inf n b(n) > (we shall see generic examples for this 
below) then any undersmoothing sequence u n — > 00 gives asymptotic coverage of the band. 
On the other hand if u n — > 00 then b n — > is admissible and the lower bound requirement 
in Condition [1] becomes more and more lenient as sample size increases. This result and 
the discussion in Subsection 14.41 below shows that our nonparametric procedure does well 
asymptotically for 'typical' Holder-continuous functions on the unit sphere. 



4.3.3 A Nonasymptotic Coverage Result 

The asymptotic Theorem 0] is in fact a consequence of the following finite-sample result. 
While the stochastic terms are similarly well-behaved as in Theorems [T] and [21 the presence of 
nonnegligible approximation error is the reason why the following theorem is more intricate. 

Theorem 5 (Finite Sample Coverage) Let f2 be any subset of M. Suppose f satisfies 
Condition [7] and let m* := m*(/) be the smallest integer such that b(n)2 tm * > 7&2- Set 
m := m n (f) = min(j*(t), m*). Then we have, for every n G N and every x > 

Pr {f(y) G C n (x, y) for every y G 17} > 1 - e~ x - v n (33) 
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where 

V n = 2(jmax ~ m)n^ K + X n 

with 

X n = I { W0 J^ A ^Jn(t) + l,Klogn 2 ) > 2K _ m _ 1)(f+t) l 
[ V n 2 A(m, j*(t) + u n - m,x) J 

with k > anta/ to £/te constant from after \2b}) and where A(n, I, x) was defined in \21\) . 

Remark 1 [Undersmoothing in finite samples] Note first that if u n > m, then the fraction on 
the l.h.s. of the inequality in the definition of X n is bounded away from zero and infinity. Conse- 
quently the tradeoff between the constants u n and b(n) is such that if Un + t^ 1 log 2 (6(n)) — > oo 
then I n = for all n from some no onwards, which in particular implies Theorem |U Not 
surprisingly obtaining coverage in finite samples is more delicate, as no depends on /: The 
undersmoothing constant u n should be chosen so large that X n = for every n. Closer in- 
spection of X n shows that this is possible if an upper bound for m is available, which can 
be obtained by requiring an apriori lower bound for the sequence b(n) as well as for t. The 
discussion in Section S3] will show that such apriori bounds can indeed be obtained in relevant 
cases. 

Remark 2 [Admissible lower bounds in Condition [7]/ Another point of view is to start with 
an undersmoothing sequence u n and to ask which sequences of 6(n)'s are admissible to obtain 
coverage. Assume for simplicity that the sample size is 2n and that n\ = n 2 = n. Let 
C n («dogn, y) be the confidence band from ([27|) with undersmoothing sequence u n S N and 
x = k log n. If / satisfies Condition [T] and if 

b(n) > 7b 2 • (100) i/(t+d/2) 2^^ +2 )*, 

then 

Pr{/(y) G C n {Klogn,y) for every y € fl} > 1 - (2j max + 3)n" K . (34) 

For instance if d = 2 and / is at least once differentiable, then finite sample coverage holds 
for the set of densities that satisfy Condition Q] for 1 < t < oo and b(n) > b 2 ■ 2 8 ' 2_ '" n . 

Remark 3 [The role of the thresholding constant k] The thresholding constant k plays an 
important role in the construction of j n . Our results are presented for fixed n without any 
restriction on this constant. This is an advantage since this constant has to be carefully 
chosen in applications. Our bounds typically contain a term of the form n~ K , and one could 
be tempted to choose n as large as possible, however it is important to notice that choosing 
k very large will increase the difficulty of cancelling X n in Theorem [5j An adaptive choice of 
this tuning constant is possible but beyond the scope of this paper. 

4.4 Regularity of Functions on the Sphere and Condition Q] 

Condition [1] can be characterized in terms of classical Holder regularity properties of the 
unknown density / : M — > M. We shall only discuss the case M = S^, which is the case of 
primary statistical interest, but all findings below generalize to M with suitable modifications. 

There are several ways to approximate unknown functions defined on § rf , but it is a 
fortiori not clear whether a given method retrieves the natural intuition that the degree of 
smoothness of a function / is the driving quantity of the approximation properties of /. For 
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instance, while L 2 (M)-projections onto spherical harmonics constitute a way of approximating 
a continuous function / : S d — > R, it is well known already from the special case d = 1 that 
this approximation may diverge at any given point x, which is particularly worrying when one 
is interested in the local or even uniform behavior of the approximation errors. Furthermore 
the important question arises whether the approximation method allows for very smooth (for 
instance infinitely differentiable) functions to be approximated in an optimal way. 

The fact that needlets form a tight frame of L 2 (E> d ) implies good approximation properties 
in that space, similar to those of the spherical harmonics. Moreover, these approximations are 
also optimal approximands for differentiabl e and Holder-continuous fu nctions in the uniform 
norm on S rf (as follows from the results in Geller and Pesenson 2O10l ]). so the upper bound 
in Condition [1] has a natural interpretation in terms of Holder-Zygmund- norms on S d . 

The lower bound i n Condition Q] is more intricate. The results in Ijaffardl [200d | and 
Gine and Nickll 2010a( ] for functions on R suggest that this condition should be satisfied 



if / 'attains t as its Holder exponent' viewed as a function on the unit sphere (in fact a 
slightly stronger requirement is necessary). In the simplest case, if a real- valued function / 
defined on R scales like \x — xo|* at some point Xq (if t > 1 a similar property has to hold 
for the highest e xisting derivative), then / attains the Holder exponent t, and the results 



in 



■Taffardl j200d ] imply that 'quasi every' func tion (in a Baire sense) in C (R) does this. 



Indeed Proposition 4 in I Gine and Nickll 2010al ] implies that quasi-every function in C*(R) 
satisfies the lower bound in the R-analogue of Condition [1] (where Aj(f) has to be replaced 
by a corresponding wavelet projection). Proving such general results in the case where / is 
defined on the sphere is technical, mostly since needlets only form a tight frame but not an 
orthonormal basis. We therefore return to the intuition of Holder exponents and show that 
'typical' a-H61der functions on §> d satisfy Condition [TJ let us consider spherical analogues 
of functions on R that scale like \x — xq\: If xq is any point in S rf , then the zonal functions 
d S d(x,xo) or (1 — (x, xo)d+i) 1 ^ 2 are natural candidates for the class C 1 (S d ). More generally 

f a (x) = (1 - (x,x )d+i) a/2 
for 0<a<oo,a/2^N, is a natural candidate for C a (S d ). We prove in Proposition O below 



6l2-J' Q < P^/a)" / Q ||oo <b 2 T 



J" 



for some fixed constants < b\ < 62 < 00. Note that obviously, for a = 2k, k E N, f a [x) 
(1 — (x, xo)d+i) k = 1 — cos(d S d(x, xo)) k is actually a polynomial on E> d . 



5 Proofs for Section [3] 

5.1 Proof of Theorem Q] 

If / is the volume element of M, then 

||^-(/)-/||oo = 

for every j > 0. Clearly by definition of j n 

Pr{i„./0}< ]T Pr{||/ n2 (0)-/, 
iej-.i>0 



(35) 



> 4cr(n,/)} 
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Now since Ef n (l) = Ai(f) = f for every I > 0, the Z-th probability is bounded by 

Pr{||/n 2 (0) - / n2 (0 - Ef n2 (0) + Ef n2 (l)\L > MM)} 
< Pr{||/ n2 (0) - ^(0)11^ > 2a(n,l)} +Pr{||/ n2 (/) - Ef m (l)\L > MM} < 2n 



-2k 



2 



in view of Proposition [3l so that Pr{j„ 7^ 0} < 2j max n 2 2k follows. To prove the second claim 
of the theorem, we have from independence of j n and f ni , from ([35 p and from Proposition [3] 

Pr{/(y) G C n (x,y) for every y € M} 



Pr < sup 



/n(y) - /(y) < s„(z) 



> 1 - Pr < sup / m (j n + u n ,y) - Eif n (j n + u n , y) > a(ni,j n + u n , x) > 

= 1- ^ PT {\\fni(l + u n ,-)- E 1 f ni (l + u n ,-)\\ oa > a(ni,l + u n ,x)}PT{j n = l} 
o<i<j max 

>l_ e - Pr{j n = l} = l-e- x . 

o<l<j max 

The last claim of Theorem [T] follows from the first and definition of cr(n,l,x). 
5.2 Proof of Theorem H 

Since i max -> 00 as n -> 00 and since this theorem is of an asymptotic nature we assume 
J < 

Jmax in what follows. We recall from ([3]) that / 6 E 2 2j-\ implies 

HA(/)-/||oo = (36) 

for every I > J. Then 

Pr{j„>j}< Yl P H\\fn 2 (J)-fm(l)\L>Mn,l)}, 

l£j:l>J 

and the Z'th summand is bounded by 

PHWfm(J) ~ fn 2 (l) - Ef n2 (J) + Ef n2 (l)\L > MM)} 

< Pr{||/ n2 (J) - S/ n2 (J)IL > 2a(n,l)} + Pr{||/„ 2 (/) - Ef n2 (l)\L > MM)} < 2^" 

in view of (|36|) and Proposition [3j 

For integer Z < J — 1 (so that 2 l < 2^ _1 ) we have 

||A(/)-/Hoo> mf \\p-f\\oo = h >0 

since A;(/) G E 2 2j~2 and since E 2 2j~2 is a closed proper subspace of E 2 2j-i. By definition we 
have 

Pr(in = l)< Pr(||/ n2 (0 - /„ 2 (J)IL < 4a(n, J)) . (37) 
The triangle inequality and (|36p now give 

IIAwCO - /n 2 (J)IL > - /Hoc - ||/ n2 (z) - £/ n2 (Z) - /„ 2 (J) + ^(J)^ 
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so that the probability in (|37|) is bounded by 

Pr (||/ n2 (Z) - Ef n2 (l) - f n2 (J) + S/n^lL > h - Aa(n, J)) < 

Pr (\\f m (l) - Ef n2 (l)\\oo > | - 2<r(n, J)) + Pr (j|/„ 2 (J) - ^/n 2 (J)IL > | - 2<r(n, J) 

For n large enough depending on b\ we have 2a(n, J) < b\/A so that Proposition [3] implies, for 
J fixed, Pr{j„ < J — 1} < X^o<z<J-i P r {i" = — 2Je~ cn for some constant c > depending 
on bi,J and those constants appearing in the definition of a(n,l,x) that do not depend on 
n, /. Summarizing we deduce Pr{j n ^ [J — 1, J]} < 2j max n2 re + 2Je~ cn for n large enough. 
To prove coverage we proceed as in Theorem [H noting u n > 1, 

Pr {/(y) G C n (x,y) for every y G M} 

>1-Pr<sup f ni (jn+u n ,y) - f(y) > cr(n 1; j„ + u n , x) > 

> 1 - 2 Je" cn - 

Pr{||/ ni (Z + ^,0-^i/ ni (Z + "Un, Olloo > cr(ni,/ + u n ,x)}Pr{j n = /} 

> 1 - 2Je~ cn - e~ x Pr i^ = > 1 " e" x - 2Je~ cn 

J—l<l<jm&x 

where we used (|36p and Proposition [3l The last claim of the theorem is proved as in Theorem 

m 

5.3 Proof of Theorems |4] and [5] 

We first prove Theorem [SJ For / satisfying Condition Q] there exists a unique t := t(f) such 
that / satisfies Condition [1] for this t. Define 

B(j,t) = b 2 2- jt , j* n (t)=mm{j£j\{0}:B(j,t)<a(n 2 J)}-l. (38) 

If no j € J exists such that B(j,t) < a(n2,j) we set = imax — 1- We shall assume 

without loss of generality that b 2 is large enough such that b 2 > cr(l, 0). In this way B(j*(t)) > 
°~( n 2,jn(t)) a l so holds when j*(i) = 0. 
It is easy to see that jj^(i) satisfies 

2«<> = (39) 
V log n 2 y 

so is a 'rate optimal' resolution level for estimating / satisfying Condition [1] for the given t. 
The constants in the definition of jj^(t) depend only on b 2 ,t, a, d, k 2 and 



Lemma 1 aj For every iiGN, 

Pr(j n > j*(t) + 1) < 2(j max - £(t)K*. (40) 

Lei m := min(j*(i), m*) where m* is the smallest integer such that (b(n)/b 2 )2 tm > 7. 
Then, for every j £ J satisfying < j < j*(i) — m and every n £ N we have Pr(j n = j) < 
2n^ h '■ As a consequence, for every n E N, 

Pr (j n < j*(t) - m) < 2(j*(t) - m)n^ (41) 
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Proof. Since this lemma only involves the sample 52, we set n = n<i for notational simplicity. 
We also put j* + = j^(t) + 1. If j* + = j max Part a) is proved. Otherwise one has 

Pr(i„. > f n + ) < Y, Pr (\\fnUn + ) ~ fn(l)\\ n > 4<r(n, I)) . 

iejd>j*+ 

We first observe that by Condition [1] (noting also Ef n (j) = Aj(f)) 

\\fn(f n + ) ~ fn(l)\\ Q < \\fn(f n + ) ~ /n(0 - Ef n (f n +) + Ef n (l)\\ Q + B(j*+,t) + B(l, t), 

and that 

B(j*+,t) + B(l,t) < 2B(j* n +,t) < 2a(n,f n + ) < 2a{n,l) 

by definition of and since / > Consequently, the l-th probability in the last sum is 
bounded by 

Pr (||/n(j; + ) - /n(0 - Ef n (f n + ) + Ef n (l)\\ n > 2a(n, I)) 

< Pr {\\f n (f n + ) - Ef n (j*+)\\ Q > a(n,l))+Pr(\\f n (l) - Ef n (l)\\ n > a(n,l)) < 2n~« 

where we have used Proposition [3l 

To prove the second claim, fix j < j^{t) — m. Clearly we only have to consider the case 
m = m*. Observe that 

Pr(i„ = j) < Pr (||/ n (j) - fn(m)\\n < Mn,j*(t))) ■ (42) 
Now using Condition [1] and the triangle inequality we deduce 

WfnU) -fn(m)\\n > ^(i,t)-SO:(t),t)-||/n(i)-^/n(i)-/n(i;(t))+^/n(^(i))|ln 

02 



so that the probability in f)42|) is bounded by 

Pr (\\f n (j) - Ef n (j) - f n {f n {t)) + Ef n (f n (t))\\ n > b -^B(j,t) - B(j*(t),t) - 4a(n,m) 



By definition of and B(j,t), we have 



b{n) 



fc> B(j,t)-B(j*(t),t) = b M 2 ^-j) B (Jl(t),t)-B(j*(t),t) 



> 



^2*" -1 }DU:,U)-I) 



as well as B(j^(t),t) > cr(n,j*(t)) > a(n,j) so that the last probability is bounded by 



Pr \\f n (j) - Ef n (j) - Uf n {t)) + Ef n (f n (t))\\n > 



0"(«>in(*)) 



<Pr(\\f n (j)-Ef n (j)\\ n >T 



-ifb(n), 



,tm 



5 <r(n, j) 



+ Fr[\\f n (f n (t))-Ef n (j* n (t))\\ Q >2 
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By definition of m, the term in brackets is greater than or equal to two, and then - using 
Proposition - the last two probabilities do not exceed 2n~ K . Moreover, 



Pr [j n < j*(t) - m 



£ Pr(j„=i)<2 ]T n ~ K 

0<j<jn(t)-m 0<j<j*{t)-m 

< 2(j* n (t)-m)n- K , 

which completes the proof. ■ 

Combining (|40p with (|4ip we have, for every n £ N and for m as in the lemma 

Pr{in i \jn(t) ~ m,3n(t) + 1]} < 2[(£(i) - m) + (j max - f n (t))]n 2 K 

= 2(j max rti)ri2 '■= Z n , 



(43) 



a fact we shall use below. 

We now prove Theorem \E\ Denoting by E\ expectation w.r.t. S\, one has by definition of 
s n (x) that 

Pr{/(y) G C n (x,y) for every y £ Q,} 



Pr < sup 

yen 



fn(y) - f(y) < s n (x) 



1 — Pr < sup 

yen 



fn(y) - Eif n (y) +Eif n (y) - f(y) > 1.01cr(ni, j n + u n , x) 



> 1 - Pr |||/ n - E 1 f n \\ n > a(m,j n + n n ,x)| - Pr |||-Ei/ n - /||n > 0.01cr(m, j n + ii„,x)| 
= 1-1-11 

About term /: This probability equals, by independence of f ni (j,y) and j n , 

Pr {ll/ni(jn + U n , •) - E 1 f ni (j n + U n , > cr(ni,j n + Un^)} 

= Pr {||/ ni (i + u n , •) - Eif ni (l + u„, -)||n > cr(ni,/ + u n ,x)} Pr{j n = /} 

0<Kj 1 1 1 a x 

<e"* ^ Pr{i n = 1} = e~ x 

o<i<j mas 

in view of Proposition [3l 

About term II: Using Condition [1] as well as ()43p . and recalling f)21 [) . this quantity equals 

Prj Ef ni (j n +u n ) - f > 0.01a(n, j n + n n ,x)| 

< Pr{l006 2 2~ t(i " + "' l) > <r(nij„ + u n ,x)} 

= Pr|lOO^T6 2 > 2 (j " +n " )( i + * ) ^(ni,j n + ti„,x)} 

< ^ /{lOOy^Ifea > 2^+"")(i +i ) J 4(n 1 ,/ + u n ,x)}Pr{j n = 1} + Z n 

&(t)-m<l<j*(t)+l 
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< / J ■ 10062 ^ > 2 (i*(t)+l)(f +t) 2 ( url - m -l)(f +t ) I + z 



A(nx,j*(t) + u n -m,x) 

<l\l00M A ^ 2 - j ^ + 1 ' KlOgn ^ > 2<«»-™- +Z n 
[ V n 2 A{ni,j*(t) +u n -m,x) J 

where we have used that (|38|) implies 

2 0nW+i)(|+*) > V^2&2 

-4(^2, 3n (*) + l 5 ^log n 2 ) 

in the last inequality. This proves Theorem [5j Theorem 0] follows from Theorem [5] using that 
tradeoff between b(n) and u n through the constant m (cf. also Remark [T]). 

5.4 Proof of Theorem [3] 

The size of the band is 2.02o~(ni,j n + u m x )- I n view of f)40|) - whose proof only requires the 
hypotheses of Theorem [3]- we have j n < jn(t) + 1 with probability larger than 1 — 2(j max — 
j*(t))ri2 K , so the size of this band is less than or equal to 2.02a(ni, j*(t) + u n + l,x) with 
the same probability bound. The second claim of Theorem [3] then follows from definition of 
a(n,l,x) (cf. ([21])) and of j*(t) (cf. 



6 Precise Validity of Condition (1311) 



In this section we investigate examples of functions verifying condition ()3ip if M = § d . Let 
us recall that the projection kernel on %fc(S rf ) is given by 

1 / A;\ d — 1 

L k ((x,y)d+i) = jgjj- ( 1 + - J C^((x,y) d+ i), 1/ = — ^— 

where C%(x) is the corresponding Gegenbauer polynomial. For ease of notation we shall 



redefin e A j(x,y) = a (k/2 J )Lk(x,y), to be in line with the notation in lNarcowich et ah 

|2006al lb|. [Baldi etal 20091 ] . [For j — > 00 this modification is immaterial.] We shall use the 



classical symbol 

VA; G N, (o) fc = a(a + l)..(a + k - 1)(= ^^r^if — a N), (a) = 1. 

r(a) 

The following Olindes Rodrigues formula defines the Gegenbauer polynomials and is useful 
for integration by parts: for t £ I = [—1, 1] 

Proposition 6 For < a < 00, ^ N, we define the following functions: 
f a (x) = (V 1 - (x,x ) d +ij = (\A - cos(dgd(x,xo))) 

where d S d is the geodesic distance on S d . Then there exist constants c\ > 0, C2 > independent 
of j such that 

Cl 2^ a < \\A j (f a )-f a \\ 00 <c 2 2^ a . 
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Proof of the upper bound: Let us consider first the case < a < 1. We have 



\Mf a )(x) - f a (x)\ 



< 



Aj(x,y)f a (y)dy - f a (x)\ 

!-l 

Aj(x,y)(f a (y) - f a (x))dy\ 

i. 

\ A j(x,y)\\fa(y) - f a {x)\dy 



But 



so 



0' 1 

V0,0' 6 [0,tt], I Vl -cos5- Vl -cos0'| = v / 2|sin--sin-| < -=\ 

2 2 y2 



- /i(y)l = l\A ~ cos(d §d (x,x )) - \A - cos(d S d(y,a;o))| 
< ^=l^(x,xo) -d §d (y,a;o)| < -^d §d (x,y) 

And, by the subadditivity of x i— > x a for < a < 1 

|/a(*) - /a(y)| = |/f (x) - /f (y)| < |/i(x) - / x (y)r < 
So, by the integration formula for zonal functions on the sphere (Section 9.1 in Faraut 20081 ]): 



VxG§ d , / \A J (x,y)\\f a (y)-f a (x)\dy<2~ a / 2 / ^({x, y) d+l ))\(d sd (x,y)) a dy 
= 2 -a/2| S d-i| f (cos 0)0 a (sin 0)^0 < 2- Q / 2 |S d - 1 | / A,(cos 9)6 d - 1+a d0 



But using the concentration result iNarcowich et al.l 2006b[ | 

WK > 0,3 C K < oo, Aj (cos 0) < C K 2 jd [l A l/(2 j 9) K ] 
Taking K > d + a, we obtain 



l^'Cf) - /Hoc < 2 -«/ 2 |S^- 1 |C^2^ ^ 
< 2- a ' 2 \E d - l \C K 2-i a — 



id— 1+a 



d8 + 



i 



-l+a 



(2?e) 



r,<i0 



A' 



(d + a)(A-d-a) 

Let us now consider the case a > 1. Taking d = 2 the previous proof shows that, on 
the classical torus T , for < a < 1 , the 2ir— periodical function 4> a (9) = (vT — cos 0) Q = 
2 a | sin|| a belongs to C Q (T). But, if for in N, a equals a = A; + ft < + 1, it is clear 
that (j)oi(6) is A— times differentiable, and D k (j) a (9) as a linear combination of Coo periodical 
funtions times |sm||^ +J ; j = 0,1,..., k belongs to C^(T). So, (p a E C a (T), and, as moreover 
(j) a {9) is even, there exists Pj (cos 9), a sequence of trigonometrical polynomials of degree less 
than 2 J such that : 

cc« 0) Q -P,-(cos fl)!!^ < C2' ja 
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But Pj(cos(x, xo)d+i) is a polynomial on the sphere of degree less than 2 J and 

||(Vl-cos(x,xo)d + i) a - ^-((x.xoWOHoq < C2^' a . 

Proof of the lower bound We only have to consider the case j large enough since f a is not 
a spherical polynomial and thus not in Ej^(§ d ) for any finite N. Using again the integration 
fomulae for zonal functions 



||^j(/a) /a | |oo ^ 



Aj(f a )(xo) - fa(x )\ 

Aj{x ,y)(f a (y) - f a (x ))dy\ 



3'' 



ef-1 



S d-1 

s^ 1 



^i(^o,y)(vi - {y,xo)d+i) a dy\ 

^ (cosfl) (VI - cos0) a (sin^) d - 1 ( 
^•(i)(l-t) Q/2 (l-t 2 )^ 1/2 dt| 



E ^x 1 + £) / wx 1 - *) a/2 a - t 2 y- 1/2 dt\ 



1 0<fc<2J 



V Jl 



But, using 

C£(t)(l - t) a/2 (l - t 2 y- x l 2 dt = j C£(i)(l - t) a ' 2 uj u {t)dt 



(-1) 



fc!2* („+i) fc 7 7 



(1 _ t) a l 2 D k {{l - t 2 fu v (t)}dt 



1 (2^) fc 
k\2 k (i/ + i) fc 7/ 



Clearly V/c > 0, / (because § N), u fe = {-l) k \u k \ for < k < § + 1 and Uk 
-(-l)[ a / 2 l|ufc| for k > § + 1. By the upper bound, and for j large enough : 

C2-^ >n^/ a - /ail* > Si E + 

0<fc<2J 



S 1 ! E a(^)(i + ")(-i) fe KI-(-i)^ 21 E «(i)(i + ")KII 

1 1 0<fc<o/2+l a/2+Kfc<23 



„, > E (i + £K-(-D [a/2] E +*)!«*! 

0<fc<[a/2] a/2+l<fc<2J 
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So if [a/2] is even, and j large enough 

| £ a(A)(i + ^ K | = £ (l + £)„ ft - £ a(A)(i + ^) K | 

0<fc<2J 0<fc<[a/2] a/2+l<fc<2J 

= E + E + 

0<fc<[o/2] ' a/2+l<fc<2J 

So 

E (i + *)«*< I E «(^)(! + ^KI< E ( 1 + ^K- 

0<fe<2i 0<fc<2J 0<fc<2J" 1 

Now, if [a/2] is odd, and j large enough 

| £ a(A)( 1 + ^ K | = _( £ (1 + + £ a(^)(l + ^)KI) 

0<fc<2-? 0<fe<[a/2] a/2+Kfc<2J 

= -( E (1 + -K+ E ^H 1 + -)«*) 

0<fc<[a/2] a/2+l<fc<2J 

So 

- E (! + -)«*< I E «(^)(! + ;KI<- E (! + ^K> 

0<fc<2J 0<fc<2J 0<fc<2^- 1 

and in any case 

i E «4)(i + -ki~i E + [cm(i-tr /2 (i-t 2 y- 1/2 dt\ 

* — rf L J V A — * V it 

0<k<2i 0<fc<2J 



Denote now by (•, -) u the L 2 ([— 1, l])-inner product w.r.t. oj v and recall (see I Andrews et al 



19991 ] p.343) 

2^ (1 + -)C fc (x) - 2i/(l -x) 

so that 

£ (l+^)^(x),(l-x) a / 2 ), = (n+2^)(C(x),(l-x)«/ 2 - 1 ),-(n+l)(C +1 W,(l-^) Q/2 - 1 ), 

0<fc<n 

J (2^) 

£;!2* (z/ + 7j 



<<7£(x), (1 - x) a l 2 ) u = (-l) \, ofc , £ / (1 - a/2_1 £> fc ((l - i 2 )*^))^ 
1 (2i/) A 



^— (i - / (i - 1)"/ 2 - 1 -^! - t 2 ) fc (i - t 2 y~v 2 dt 



_ i r(2z, + fc) r^+|) r(-f + fe + i) r +a/2 _ 3/2n , , r -i/ 2+fc 
r(2i/) r^ + fc + i) r(i-f) y/ 1 tj + 

sin ^r^ 1 r(2^ + fc) ryfj) a w+ , r(, + f-i)r(, + H^ 

(a/2 W r(2i/) r^ + fe + i) 1 ( " 2 +k+l)1 T{2u + k + ^) 
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2 a/ 2 sin (2^)r( J/ + |_ i)r(a/2) r(fc + 1 - § )r(2i/ + fc) 



T(z/)0F kT(2v + k + 

Using the following standard formulaes 

r(i - «/2)r(a/2) = — ^— ; r(2z,)0F = 2 2 "- 1 r(i/)r(«/ + 1/2). 

We we deduce 

( £ (i + ^(x),(i-xr/ 2 ),= 

0<fc<n 

_ 2 a / 2 sin(^)r(i/ + f - |)r(a/2) 1 (n + 2^)r(n + 1 - f )r(2i/ + n) (n + 1 - f ) 

~ 2i/r(^)VvF n!' r(2^ + n + f) ^ ~ (2i/ + n + §)' 

_ (2i/ - 1 + a)2 Q / 2 sin(^)r(i/ + f - |)r(a/2) (n + 2zv)r(n + 1 - f )Y{2v + n) 
~ 2i/r(i/) v /7r (n + 2z^ + a/2)n!r(2z/ + n + f ) 

■ fZ^rV ^ {n + 2u) r(n+l-f)r(2z. + n) 
Sml 2 j ° " j (n + 2u + a/2) n!r(2^ + n + §) 

Clearly sin(^) determines the sign, and by Stirling's formula : 

r(n + l-f)r(2i/ + n) _ a 
n!r(2i/ + n+f) ~ n 

So the lower bound of \\Aj(f a ) — f a \\oc is of order 2 _JQ . 
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